10 research outputs found
Encrypted statistical machine learning: new privacy preserving methods
We present two new statistical machine learning methods designed to learn on
fully homomorphic encrypted (FHE) data. The introduction of FHE schemes
following Gentry (2009) opens up the prospect of privacy preserving statistical
machine learning analysis and modelling of encrypted data without compromising
security constraints. We propose tailored algorithms for applying extremely
random forests, involving a new cryptographic stochastic fraction estimator,
and na\"{i}ve Bayes, involving a semi-parametric model for the class decision
boundary, and show how they can be used to learn and predict from encrypted
data. We demonstrate that these techniques perform competitively on a variety
of classification data sets and provide detailed information about the
computational practicalities of these and other FHE methods.Comment: 39 page
Model updating after interventions paradoxically introduces bias
Machine learning is increasingly being used to generate prediction models for
use in a number of real-world settings, from credit risk assessment to clinical
decision support. Recent discussions have highlighted potential problems in the
updating of a predictive score for a binary outcome when an existing predictive
score forms part of the standard workflow, driving interventions. In this
setting, the existing score induces an additional causative pathway which leads
to miscalibration when the original score is replaced. We propose a general
causal framework to describe and address this problem, and demonstrate an
equivalent formulation as a partially observed Markov decision process. We use
this model to demonstrate the impact of such `naive updating' when performed
repeatedly. Namely, we show that successive predictive scores may converge to a
point where they predict their own effect, or may eventually tend toward a
stable oscillation between two values, and we argue that neither outcome is
desirable. Furthermore, we demonstrate that even if model-fitting procedures
improve, actual performance may worsen. We complement these findings with a
discussion of several potential routes to overcome these issues.Comment: Sections of this preprint on 'Successive adjuvancy' (section 4,
theorem 2, figures 4,5, and associated discussions) were not included in the
originally submitted version of this paper due to length. This material does
not appear in the published version of this manuscript, and the reader should
be aware that these sections did not undergo peer revie
Uncertainty in engineering : introduction to methods and applications
This open access book provides an introduction to uncertainty quantification in engineering. Starting with preliminaries on Bayesian statistics and Monte Carlo methods, followed by material on imprecise probabilities, it then focuses on reliability theory and simulation methods for complex systems. The final two chapters discuss various aspects of aerospace engineering, considering stochastic model updating from an imprecise Bayesian perspective, and uncertainty quantification for aerospace flight modelling.
Written by experts in the subject, and based on lectures given at the Second Training School of the European Research and Training Network UTOPIAE (Uncertainty Treatment and Optimization in Aerospace Engineering), which took place at Durham University (United Kingdom) from 2 to 6 July 2018, the book offers an essential resource for students as well as scientists and practitioners
Optimal sizing of a holdout set for safe predictive model updating
Predictive risk scores are increasingly used to guide clinical or other
interventions in complex settings, particularly healthcare. Directly updating a
risk score used to guide interventions leads to biased risk estimates. We
propose updating using a `holdout set' -- a subset of the population that does
not receive risk-score-guided interventions -- to prevent this. Since samples
in the holdout set do not benefit from risk predictions, its size must trade
off performance of the updated risk score whilst minimising the number of held
out samples. We prove that this approach outperforms simple alternatives, and
by defining a general loss function describe conditions under which an optimal
holdout size (OHS) can be readily identified. We introduce parametric and
semi-parametric algorithms for OHS estimation and demonstrate their use on a
recent risk score for pre-eclampsia. Based on these results, we argue that a
holdout set is a safe, viable and easily implemented means to safely update
predictive risk scores.Comment: Manuscript includes supplementary materials and figure